Sequence-based predictive modeling to identify cancerlectins
نویسندگان
چکیده
Lectins are a diverse type of glycoproteins or carbohydrate-binding proteins that have a wide distribution to various species. They can specially identify and exclusively bind to a certain kind of saccharide groups. Cancerlectins are a group of lectins that are closely related to cancer and play a major role in the initiation, survival, growth, metastasis and spread of tumor. Several computational methods have emerged to discriminate cancerlectins from non-cancerlectins, which promote the study on pathogenic mechanisms and clinical treatment of cancer. However, the predictive accuracies of most of these techniques are very limited. In this work, by constructing a benchmark dataset based on the CancerLectinDB database, a new amino acid sequence-based strategy for feature description was developed, and then the binomial distribution was applied to screen the optimal feature set. Ultimately, an SVM-based predictor was performed to distinguish cancerlectins from non-cancerlectins, and achieved an accuracy of 77.48% with AUC of 85.52% in jackknife cross-validation. The results revealed that our prediction model could perform better comparing with published predictive tools.
منابع مشابه
Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology
Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been u...
متن کاملPredicting cancerlectins by the optimal g-gap dipeptides
The cancerlectin plays a key role in the process of tumor cell differentiation. Thus, to fully understand the function of cancerlectin is significant because it sheds light on the future direction for the cancer therapy. However, the traditional wet-experimental methods were money- and time-consuming. It is highly desirable to develop an effective and efficient computational tool to identify ca...
متن کاملA Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique
Cancerlectins have an inhibitory effect on the growth of cancer cells and are currently being employed as therapeutic agents. The accurate identification of the cancerlectins should provide insight into the molecular mechanisms of cancers. In this study, a new computational method based on the RF (Random Forest) algorithm is proposed for further improving the performance of identifying cancerle...
متن کاملPredictive Risk Mapping of Leptospirosis for North of Iran Using Pseudo-absences Data
Leptospirosis is a common zoonosis disease with a high prevalence in the world and is recognized as an important public health drawback in both developing and developed countries owing to epidemics and increasing prevalence. Because of the high diversity of hosts that are capable of carrying the causative agent, this disease has an expansive geographical reach. Various environmental and social ...
متن کاملHybrid model predictive control of a nonlinear three-tank system based on the proposed compact form of piecewise affine model
In this paper, a predictive control based on the proposed hybrid model is designed to control the fluid height in a three-tank system with nonlinear dynamics whose operating mode depends on the instantaneous amount of system states. The use of nonlinear hybrid model in predictive control leads to a problem of mixed integer nonlinear programming (MINLP) which is very complex and time consuming t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 8 شماره
صفحات -
تاریخ انتشار 2017